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ABSTRACT 


An ensemble eonsisting of 150 Ziphius eavirostris voealizations was eompiled 
from aeoustie data reeorded at two High-frequeney Aeoustie Reeording Paekage (HARP) 
loeations: the Naval Postgraduate Sehool (NPS)’s Point Sur HARP and Seripps 
Institution of Oeeanography (SIO)’s site H HARP. The ensemble was analyzed via a 
prineipal component analysis (PCA). The results of the PCA verified the statistical 
robustness of the signal and yielded one dominant mode which accounted for 73% of the 
variance. The dominant mode was used to create a kernel for a matched filter detection 
scheme. The subsequent detector output was statistically evaluated against a ground 
truth. The ground truth identified 28,434 Ziphius clicks by visually inspecting over 170 
minutes of data recorded by NPS’s Data Acquisition System (DAS) at the Southern 
California Offshore Range (SCORE). The inability to visually discriminate a signal 
embedded in noise created a conservatively biased ground truth estimate which increased 
the detector’s false alarm rate. At an acceptable 0.1% false alarm rate, the detector had 
an overall 44% probability of detection. A further assessment of the detector’s 
performance divided the data into two categories: cluttered and uncluttered. At a false 
alarm rate of 0.1%, the probability of detection was 26% and 61%, respectively. 
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I. 


INTRODUCTION 


A. BACKGROUND 

In an ongoing legal dispute with the National Resources Defense Council 
(NRDC), the U.S. District Court for the Central District of California has imposed 
restrictions which consequently affect the Navy’s ability to operate Mid-Frequency 
Active (MFA) sonar. MFA sonar has been operated for over 60 years and is the primary 
method to localize submarines (Hastings, 2008). These legal implications impede the 
combat proficiency and advancement of the U.S. Navy’s Pacific Fleet’s top priority: anti¬ 
submarine warfare (ASW). On January 23, 2007, under Title 16, Section 1371(1) of the 
U.S. Code, the Deputy Secretary of Defense invoked a two-year National Defense 
Exemption (NDE) under the Marine Mammal Protection Act (MMPA) which includes 29 
mitigation measures. These 29 mitigation measures were developed along with the 
National Marine Eisheries Service (NMES) to reduce the potential impacts of MEA sonar 
on marine mammals through increased aerial monitoring and visual surveying (Eederal 
Register, 2008). 

Recent mass stranding incidents involving beaked whales, both temporally and 
geographically coincident with naval emissions of underwater sound, coupled with these 
high-profile legal ramifications have increased the need for more effective methods of 
detection and classification. Cuvier’s beaked whales {Ziphius cavirostris) are among 
those of greatest concern with respect to curtailing the potential effects from 
anthropogenic sound (Zimmer et ah, 2005; Cox et ah, 2006). Since 1960, more than 40 
mass strandings of Cuvier’s beaked whales have been reported worldwide (Cox et ah, 
2006). This species, alone, comprises over 80 percent of all marine mammals involved in 
stranding incidents (Hildebrand, 2005). Eurther amplifying the issue, research and 
knowledge of this species is severely limited. Cuvier’s beaked whales are difficult to 
study and identify via traditional visual surveying techniques due to the nature of their 
lengthy deep-diving behavioral pattern, typically spending up to 40 minutes beneath the 
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surface of the water for a single dive (Barlow et ah, 2006). Cuvier’s beaked whales 
spend less than 3 minutes at the surfaee between dives, leaving not much time for visual 
identifieation (Barlow, 1999). This speeies has also been observed to surfaee without any 
visible blow or splash (Ferguson et ah, 2006). In an experiment utilizing aeoustic 
reeording tags (DTAGs) attaehed to a Cuvier’s beaked whale, the average depth recorded 
during a deep diving period was approximately 850m, with voealizations eeasing when 
the whale was within 200 meters of the surfaee (Johnson et ah, 2004; Tyaek et ah, 2006). 
The aeeuraey of visual identifieation is further limited by many additional faetors 
ineluding: sea state, visibility, daylight, and the individual observer’s experienee level 
and biases. The development of an automated passive aeoustie deteetor would provide 
the U.S. Navy with the eapacity to observe this speeies’ presenee and movement under 
eonditions not appropriate for visual surveys. Furthermore, passive aeoustic techniques 
are more cost-effeetive, require less underway time, allow for eontinuous monitoring, and 
eould provide information on seasonal and diurnal population patterns. 

B. THESIS OBJECTIVES 

There are two primary objeetives for this thesis. The first objective is to develop 
a kernel for the voealizations of Cuvier’s beaked whales. This will be achieved by 
eondueting a principal component analysis (PCA) upon an ensemble of extraeted Ziphius 
elicks. The kernel will then be used in an automated passive aeoustie matehed filter 
detection scheme. 

The seeond objective of this thesis is to assess the performanee of the automated 
passive aeoustic detector. This will be aehieved by first ereating a ground truth eount of 
Ziphius voealizations. Then, Reeeiver Operating Charaeteristie (ROC) curves portraying 
the deteetor’s performanee will be eonstrueted via a statistieal eomparison of the ground 
truth to the detector output. 
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c. 


OUTLINE 


The remainder of this thesis eonsists of three ehapters. Chapter II describes the 
methods used to achieve the two primary objectives. To create the kernel, a principal 
component analysis was conducted upon an ensemble comprised of 150 randomly 
selected Cuvier’s beaked whale vocalizations. To assess the detector’s performance, a 
ground truth was created by visually reviewing 174.8 minutes of the Naval Postgraduate 
School (NPS)’s Data Acquisition System (DAS) recordings for the Southern California 
Offshore Range (SCORE) and identifying 28,434 occurrences of a Ziphius click. Chapter 
III contains the ROC curves and a discussion of the automated passive acoustic detector’s 
performance relative to the ground truth. Chapter IV presents the conclusions of this 
thesis. 
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II. METHODOLOGY 


A. KERNEL DEVELOPMENT 

1. Signal Characterization 

A characterization of the Ziphius signal, derived via recent researeh results, 
initiated the kernel developmental process. Although some odontoeetes, toothed whales, 
produce elieks and whistles during voealization, Cuvier’s beaked whales are known only 
to eliek (Hildebrand, 2005). Reeent researeh suggest the clicks of Cuvier’s beaked 
whales exhibit a unique spectral and temporal strueture that differs signifieantly from the 
recordings of other non-ziphid toothed whales. A unique signal is favorable for 
automated aeoustic monitoring. 

In September of 2003, researeh condueted by attaehing a digital acoustie 
reeording tag (DTAG) direetly to a whale reported a eliek duration of 175 ps with an 
intereliek interval (ICI) of 0.4 seeonds. The speetrum swept upwards from 30 to 48 kHz 
(Johnson et ah, 2004). One year later, NATO Undersea Researeh Center (NURC) and 
Woods Hole Oceanographie Institution (WHOI) collaborated in a concentrated attempt to 
build upon the sparse knowledge of Cuvier’s beaked whales. Their researeh found a 
eliek duration of 200 ps and an average ICI of 0.4 seeonds. The speetrum was frequency 
modulated (FM) and swept upwards from 35 to 45 kHz (Zimmer et ah, 2005). However, 
it should be noted that both of these studies used aeoustic recording devices with a eutoff 
frequeney of 48 kHz; and henee, no information is provided for the higher frequency 
limit of click energy. Consequently, the eliek durations are shortened and the bandwidths 
are narrowed. 

On September 26, 2005, further research by NURC, utilizing a towed array, 
reinforeed and expanded upon the DTAG Ziphius signal characterization. This recording 
method was able to eapture the entire bandwidth of the signal. A Passive Aeoustic 
Monitoring (PAM) system with a bandwidth of 96 kHz was activated after a visual 


5 



sighting of two Cuvier’s beaked whales initiating a deep dive. The PAM recordings 
depicted an upswept energy range of 16 to 60 kHz with a center frequency of 40 kHz. 
The click duration was approximately 300 ps with an average ICI of 0.38 s (Pavan et ah, 
2006). These results mark the first time a sub-surface detection device was able to verify 
characteristic features of the DTAG recordings. The increase in the signal’s duration and 
bandwidth is explained by the increased bandwidth of the recording method. 

Additional DTAG research indicates significant differences in signal 
characteristics between Cuvier’s beaked whales and other toothed whales. The Ziphius 
signal was characterized by an upswept FM pulse, an average click duration of 200 to 
300 ps, and an ICI of 0.4 s (Tyack et ah, 2006). Overall, recent research indicates a 
unique signal structure that is favorable for automated acoustic detection. 

2. Ensemble Creation 

Designed specifically to monitor marine mammals, the High-frequency Acoustic 
Recording Package (HARP) was developed by the Scripps Institute of Oceanography 
(SIO). The HARP, which is capable of a 200 kHz sampling rate and nearly 2 TB of data 
storage per instrument deployment, is ideal for recording Ziphius's higher frequency 
clicks over long periods of time. For recordings made at a sampling rate of 200 kHz, 55 
days of continuous recording is possible (Wiggins and Hildebrand, 2007). To study the 
signals of Cuvier’s beaked whales, long-term, broad-band, underwater acoustic data 
recorded via a HARP, was obtained from two different locations with known Ziphius 
activity: Point Sur and San Nicolas basin. The NPS Point Sur HARP is moored at: 36 
17.95' N, 122 23.63' W, approximately 40 km off the central coast of California at a 
water depth of 1390 meters. Acoustic data recorded during the NPS Point Sur HARP’s 
second deployment, which spanned from 24JAN07 until 17JUL07, comprised one half of 
the data that was used in the ensemble creation. SIO provided data, spanning from 
22AUG07 to 24AUG07, from their Site H HARP, located just east of the San Nicholas 
Basin at a depth of 1013 meters. 
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To evaluate the statistical robustness of the signal, two ensembles were randomly 
extracted from the HARP data sets: one a compilation of Ziphius clicks, the second a 
compilation of ambient noise segments. The click ensemble will be used to generate a 
kernel which contains the statistically dominant characteristics of the signal. The noise 
ensemble will duplicate the statistical analyses performed on the click ensemble to ensure 
that the ambient noise is not correlated. Triton software, courtesy of Wiggins (personal 
communication), was used to visually inspect the data and extract 150 random 
vocalizations, following the qualitative characterizations of the Ziphius click from 
previous research. The total ensemble of clicks consisted of 75 samples from each 
HARP location. An example of one click extraction from each location is shown in 
Figure 1. The 100 sample ensemble of random ambient noise segments consisted of 50 
noise segments from each HARP dataset. 


a) NFS’s Point Sur HARP 


c) SIO’s Site H HARP 



Time (10 ''s) 


Time (10 '’s) 


Figure 1. Two examples of Ziphius clicks extracted from HARP data: a) Spectrogram 
of a click extracted from NPS’s Point Sur HARP with click energy upsweeping 
from 35 to 50 kHz. b) Time series of a click corresponding to the spectrogram 
above it. c) Spectrogram of a click extracted from SIO’s Site H HARP with click 
energy upsweeping from 35 to 50 kHz. (d) Time series of a click corresponding 

to the spectrogram above it 
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Although both ensembles were ereated solely from HARP data, the subsequent 
analyses had to aeeount for the bandwidth and sampling frequency differences between a 
HARP and a SCORE hydrophone in order to produce results that would be applicable for 
data from either acoustic recording method. A SCORE hydrophone has a band-pass filter 
installed which limits the frequency response to a bandwidth of 8-40 kHz. At the time of 
the data collection, the sampling rate of the SCORE hydrophone was set at 80 kHz; 
whereas, the HARP was set at 200 kHz. To ensure consistency between recording 
methods, the ensembles were processed into two distinct sub-sets. 

In order make the ensembles applicable to a SCORE hydrophone, the first step 
was to decrease the sampling rate from 200 kHz to 80 kHz. To ensure the bandwidth was 
consistent with a SCORE hydrophone, a band-pass filter with pass bands of 15-40 kHz 
and was applied to the ensemble. Fifteen kHz was used as the lower pass band to 
eliminate noise that existed at frequencies lower than the Ziphius signal. The sampling 
rate of the ensembles was then increased to a sampling frequency of 1 MHz, to increase 
the resolution and decrease the potential for correlation quantization errors. For clarity, 
this first sub-set will be referred to as the ensembles that were band-pass filtered between 
15-40 kHz. 

The ensembles were also processed for applicability to HARP data in a second 
sub-set. Both ensembles were band passed between 15-60 kHz to eliminate noise at 
frequencies lower than the Ziphius signal. The higher pass band of 60 kHz allows for the 
inclusion of more high-frequency click energy. As in the first sub-set, the sampling 
frequency was increased for the click and noise segment ensembles to a rate of IMHz. 
This second sub-set will be referred to as the ensembles that were band-pass filtered 
between 15-60 kHz. 

3. Quantitative Signal Evaluation 

A correlation analysis was performed on both sub-sets of ensembles to assess the 
statistical robustness of the Ziphius signal and evaluate the feasibility for the development 
of an automated detector. First, the 150 samples of both click ensembles were demeaned. 
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normalized, and aligned via eireular shifting. For both sub-sets of eliek ensembles, the 
eliek to click cross-correlation results indicated a statistically high level of correlation 
among the samples. Figure 2 depicts the values for the 150 click to click cross¬ 
correlations for the click ensemble that was band-pass filtered between 15-40 kHz. These 
results indicate that the signal is statistically robust. A cross-correlation was also 
performed on each sub-set’s ensemble of random noise segments to ensure that the noise 
was not correlated. For both of the sub-set’s noise segment ensembles, the noise to noise 
cross-correlation results indicated a statistically low level of correlation among the 
samples. 


Click to Click Cross-Correlation for Ensemble BPF 15-40 kHz 


O 

O 

o 

n 

E 

o 

c 

UJ 



Ensemble Clicks 


Figure 2. Click to click cross-correlation results for the click ensemble that was band¬ 
pass filtered between 15-40 kHz: The correlation values range from 0 to 1, with a 
value of 1 indicating a perfect correlation. 
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Principal Component Analysis 


Once the signal was determined to be quantitatively robust, both sub-sets of cliek 
ensembles were analyzed via a principal component analysis (PCA) to further evaluate 
the potential for a matehed filter deteetion seheme. The goal of the PCA is to isolate the 
desired signal from the noise. A PCA is a useful statistieal teehnique that was invented in 
1901 by Karl Pearson. PCA is defined as an orthogonal linear transformation that 
eonverts data into a new eoordinate system sueh that the greatest varianee of the data 
eomes to lie on the first eoordinate, often referred to as the prineipal eomponent (Shaw, 
2003). The seeond varianee ranking lies on the seeond eoordinate, the third variance 
ranking lies on the third eoordinate, and so on. This method of decompressing the data 
makes it possible to retain the characteristies of the signal that eontribute most to its 
varianee by keeping the eomponents with the highest variance and ignoring the 
eomponents with the least amount of varianee. 

Mathematieally, the prineipal eomponent ean be obtained by solving the 
following eigenvalue-eigenveetor equation: 

AA'^ y,- = Vi (1) 

where, A is a data matrix with 150 eolumns, and eaeh eolumn eontains one realization of 
a realigned eliek from the ensemble. A A^ is the data eovarianee matrix. L^- is the 
eigenvalue whieh is the varianee resolved by the ith eomponent, y,-. 

The PCA was performed via a Matlab routine to yield the eomponents and the 
assoeiated variances. For the eliek ensemble that was band-pass filtered between 15-40 
kHz, the PCA’s first eomponent eontained 73% of the varianee. The remaining 
eomponents all had values of less than 6% and eorrespond to noise ineluding multipath 
eontamination. The results of this PCA indicate that there is only one dominant 
eomponent. The results of the PCA for the first sub-set are shown in Figure 3. 
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Principal Component Analysis for the Click Ensemble BPF 15-40 kHz 



Figure 3. Principal component analysis results for the click ensemble that was band¬ 
pass filtered between 15-40 kFlz: The dominant component is emphasized with a 
red circle and contains 73% of the variance. 

The second sub-set, which was band-pass fdtered between 15 and 60 kHz, also 
produces only one dominant component. The first component of the second subset’s 
PCA contains 66% of the variance. The remaining components correspond to noise 
including multipath contamination. Both PCAs indicate that the first component can be 
used as a kernel in a matched-filter detection scheme. The first components of each 
subset’s PCA were extracted to be used as kernels, shown in Figure 4. The first subset’s 
kernel is noticeably shorter in duration than that of the second sub-set. This is because it 
was created from a click ensemble with a narrower bandwidth, 15-40 kHz vice 15-60 
kHz; and therefore, some of the higher frequency click energy was excluded. The 
variance of the first sub-set’s kernel is also higher, 73% in comparison to 66% of the 
second sub-set’s kernel. This is because the first sub-set was created from a narrower 
bandwidth; therefore, there was less in-band noise. 
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a) Kernel from PCA BPF 15^0 kHz Kernel from PCA BPF 15-60 kHz 



Figure 4. Kernels developed for use in a matched-filter detection scheme: a) Kernel 
created from the PCA of the click ensemble that was band-pass filtered between 
15-40 kHz can be used as a kernel with SCORE hydrophone data, b) Kernel 
created from the PCA of the click ensemble that was band-pass filtered between 
15-60 kHz can be used as a kernel with HARP data. 

This thesis does not utilize or assess the kernel created from the second sub-set of 
data that was band-pass filtered between 15-60 kHz. Follow-on research would be 
valuable in assessing the performance of this kernel in a matched-filter detection scheme 
for comparison to the performance of the first sub-set’s kernel. From this point forward, 
all references to a kernel are with respect to the first sub-set of data that was band-pass 
filtered between 15-40 kHz. 

One final analysis was performed to further investigate the robustness of the 
Ziphius click and evaluate the performance of the kernel on the click ensemble: a cross¬ 
correlation of the kernel to the entire click ensemble. This cross-correlation is shown in 
Figure 5. The majority of the clicks within the ensemble are highly correlated to the 
kernel. The cross-correlation of the kernel to click 51 produces a high correlation value 
with only one dominant arrival and represents minimal multipath effects. However, it 
should be noted that a few of the clicks have a lower cross-correlation coefficient. For 
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example, click 98 is not as highly correlated to the kernel. This particular cross¬ 
correlation example clearly indicates multiple peaks which can be attributed to multipath 
effects. The normalization of the signal in the presence of multipath arrivals is 
responsible for decreasing the correlation coefficient. Without normalization, the peak 
correlation value for this particular example would be consistent with the higher values of 
the other cases. Overall, the results of the kernel to click ensemble cross-correlation are 
further evidence that the Ziphius click is a robust signal, and signifies that a kernel can 
feasibly be used in a matched-filter detection scheme. 

A cross-correlation of the kernel to the noise segment ensemble was also 
performed. All of the noise segments were poorly correlated to the kernel. This is an 
expected result and verifies that the high correlation coefficients of the kernel to click 
ensemble are not a coincidence. 


a) Cross-Correlation of Kerne! to Click Ensemble 



Figure 5. Cross-Correlation of the Kernel to the Click Ensemble; a) A majority of the 
correlation coefficients indicate that the kernel is highly correlated to the clicks of 
the ensemble, b) Click 51 is highly correlated to the kernel, c) Click 98 is poorly 

correlated to the kernel. 
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B. AUTOMATED MATCHED FILTER DETECTOR SCHEME 


The statistically dominant first component produced via the PCA can be used as a 
kernel in a matched filter detection scheme. The kernel was cross-correlated with 
acoustic data obtained from NFS’s Data Acquisition System (DAS) recordings at 
SCORE, using a matched filter detector designed by Chris Miller (personal 
communication) of NFS’s Ocean Acoustic Laboratory (OAL). The SCORE data that was 
fed into this detector came from a hydrophone at a depth of 1,497 meters and located at 
32 50.62' N, 119 5.26' W in the San Nicolas Basin. 

The automated passive acoustic matched-filter detection schematic is portrayed in 
Eigure 6. The first step in Miller’s detector was to cross-correlate the kernel with the 
SCORE data. Then, the output of this first box was peak picked above a given threshold. 
The final box of Miller’s detector utilized a rank-ordered culling system with a culling 
window of +/- 390 ps. The culling window size of +/- 390 ps was selected because it is 
exactly twice the length of the kernel. This step removes a majority of the multipath 
effects as well as the side lobes that were introduced by the correlation and the sinusoidal 
nature of the detector kernel. Removing the side lobes by culling the data cleans up the 
output and significantly reduces the number of false alarms. 



Eigure 6. Automated passive acoustic matched filter detection schematic: The design 
and development of the detector are courtesy of Miller (personal communication). 
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c. 


GROUND TRUTH CREATION 


1. Selection Criteria 

To evaluate the performanee of the deteetor, the deteetor output must be 
eompared to an assumed ground truth. The ground truth was ereated by visually 
inspecting SCORE data and annotating each instance of an observed Ziphius click. 174.8 
minutes of acoustic data recorded on 23FEB08 by a SCORE hydrophone was reviewed in 
the ground truth creation process. The duration of a Ziphius signal is less than 400 ps; 
therefore, the time scale used to visually review the SCORE data was divided into 12,800 
smaller segments, each with a length of 0.82 s. The final log of presumably positive 
Ziphius vocalizations was then used to statistically analyze the automated detector’s 
performance via probabilistic means comparing hits, false alarms, and misses at varying 
threshold levels. 

The ground truth creation proved to be the most arduous and time-consuming 
aspect of this research. Even at a decreased time scale, the certainty of the ground truth 
remained dependant upon discernment. An initial ground truth was deliberately 
discarded; because, as the ground truth creation process progressed, the experience level 
and the signal discrimination improved and unacceptable inconsistencies became 
inherent. 

The successive ground truth creation process incorporated specific criteria to 
alleviate subjectivity. The first criterion mandated that a click selected for inclusion in 
the ground truth must have continuous energy between 22.5 and 35 kHz. This standard 
was adopted under the notion of continuous eye integration, meaning the eye has the 
ability to visually connect miniscule gaps within the click energy of the spectrogram. If 
the first condition was not met, the second criterion directed ground truth inclusion if a 
click was part of a distinctive click train, consisting of regular, repeated clicks with a 
constant ICE Figure 7 exemplifies an instance in which the energy criterion was not met; 
however, a distinctive click train was present. Thus, the clicks not spanning 22.5 to 35 
kHz were still included in the ground truth having met the second criterion. The energy 
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criterion is visibly enhanced with the overlaying of two solid black lines on the 
spectrogram. The dashed blue lines on the spectrogram and the blue stars on the time 
series indicate identified clicks. The final criterion established that only one click would 
be selected in the case of a cluster. A cluster consisted of multiple clicks that were 
visually indistinguishable from one another at the prescribed time scale. These subjective 
criteria allowed for the creation of a more objective ground truth. In total, 28,434 clicks 
were identified in the 174.8 minutes of data that was reviewed. 



Spectrogram for Record 129: Window 183: Start Time: 02-23-2008 09:18:58.008 
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Figure 7. Ground truth creation example: The upper panel is a spectrogram, and within 
it the energy criterion is exemplified by the solid black lines spanning 22.5-35 
kHz. The lower panel is the corresponding time series of the SCORE data. The 
click energy does not span the entire width of the energy criterion; however, there 
is a distinctive click train. The dashed blue lines on the upper panel and 
corresponding blue stars on the lower panel represent the identification of a click 
utilizing the second criterion, which directs the selection of a click if it is part of 
distinct click train. This is also an example of a time period where the Ziphius 
click was able to be distinguished among competing signals. Time periods such 
as this were designated as “clutter” for the subsequent statistical analysis. 


16 





































Average Body Weight (kg) 


2. Statistical Analysis Exclusions 

The eventual detector output was biased with respect to the ground truth creation. 
The acoustic signatures of other marine mammals occur approximately within the same 
frequency range as that of a Cuvier’s beaked whale. Figure 8 (National Resources 
Council, 2003) depicts these overlapping frequency ranges of vocalizations. Visual 
surveys conducted from July 2006 to April 2007 identified several of these species of 
marine mammals with overlapping vocalization frequencies in the SCORE. In addition 
to Cuvier’s beaked whales: Risso’s dolphins, Pacific white-sided dolphins. Sperm 
whales, Orcas, Baird’s beaked whales. False killer whales, and Humpback whales have 
all been found in the SCORE (Hildebrand, 2007). 
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Figure 8. Representative vocalizations of marine mammals (National Resources 

Council, 2003): Tonal vocalizations are plotted in red; impulsive vocalizations 
are plotted in blue. The thicker lines represent frequencies near maximum energy 
and the thinner lines indicate the total range of frequencies. The numbers above 
the line indicate measured source levels in dB re pPA at Im. 
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The overlapping frequency ranges of other marine mammals and Cuvier’s 
vocalizations create uncertainty within the ground truth. To diminish this uncertainty, the 
time periods containing such indiscriminant signals were purposefully excluded from the 
subsequent statistical analysis. Similarly, time periods in which the data recordings were 
interrupted and/or turned off were also eliminated. Figure 9 is an example of a time 
period that was deliberately removed from the ground truth due to its indistinguishable 
clutter. A total of 28.89 minutes were selected to be excluded from the statistical 
analysis. 




Figure 9. Ground truth exclusion due to indistinguishable clutter: The upper panel is a 
spectrogram and the lower panel is the corresponding time series for the data that 
was reviewed to create the ground truth. This is an example where the signal was 
indistinguishable due to the presence of other marine mammals’ vocalizations. 
Time periods such as these were excluded from the statistical analysis because the 
Ziphius signal could not be visually distinguished from amongst the clutter. 
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3. 


Clutter Categories 


To further remove uneertainty from the remaining ground truth, a sub-eategory 
eonsisting of un-exeluded elutter was ereated. This eategory eonsists of time periods 
wherein signifieant clutter was present; however it differs from the previously discussed 
excluded clutter because in this instance the Ziphius signal remained discernable. By 
distinguishing the cluttered time periods from the non-cluttered time-periods, two distinct 
sets of statistics were able to be generated for the detector performance analysis. Figure 7 
is an example where competing signals were present; yet, the Ziphius signal was still able 
to be distinguished among the clutter. A total of 20.83 minutes were designated as un¬ 
excluded clutter. 

4. Interclick Interval 

During the creation of the ground truth, an unexpected observation was made with 
respect to the ICI. Previous research has cited an ICI of approximately 0.4 s for Cuvier’s 
beaked whales (Johnson et al, 2004, Zimmer et al, 2005, Pavan et ah, 2006, Tyack et al, 
2006). The data inspected to create the ground truth consistently displayed Ziphius 
vocalizations with a discernable ICI of approximately 0.05 s. A possible explanation for 
this striking difference could be that these are different animals vocalizing intermittently. 
It is also possible that these are not Cuvier’s beaked whales. However, the average 0.05 s 
to 0.1 s ICI appears to be regular and is repeated constantly throughout the dataset. 
Figure 10 depicts one of these time periods within the ground truth with a distinctive and 
regular ICI. This particular example has an ICI of 0.05 s which is not an uncommon 
observation. The order of magnitude difference between the referenced literature and 
these observations was unexpected. Further exploration of the ICI dynamics is tangential 
and beyond the scope of this research. 
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Ground Truth Creation Spectrogram for Record 129: Window 509: Start Time: 02-23-2008 09:23:25.067 



Cooresponding Time Series of the Ground Truth Creation 



Figure 10. Ground truth example with a distinct 0.05 s ICl; The upper panel is a 

spectrogram and the lower panel is the corresponding time series of the data that 
was reviewed to create the ground truth. The solid black lines on the spectrogram 
at 22.5 and 35 kHz are representative of the ground truth’s energy criterion. The 
dashed blue lines on the spectrogram and blue stars on the time series are 
representative of click identifications. This figure depicts an ICl of 0.05 s. 
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III. DETECTOR PEREORMANCE RESULTS 


The performance of the automated passive acoustic matched-filter detector was 
assessed by statistically comparing the detector’s output to the ground truth. A detector 
output hit that corresponded to a ground truth click identification was a correct hit. A 
detector output hit that did not correspond to a ground truth click identification was a 
false alarm. A ground truth click identification that did not have an associated detector 
output hit was a miss. A correct rejection occurred when there were no detector output 
hits and no ground truth click identifications. Probabilities of detection (P(D)) and 
probabilities of false alarms (P(FA)) were calculated by the following equations: 


P(D)= 


H 

H+M 


( 2 ) 


P(FA)= 


FA 

FA+CR 


( 3 ) 


where, H is the number of correct hits, FA is the number of false alarms, M is the number 
of misses, and CR is the number of correct rejections. 

By calculating the P(D) and P(FA) at varying threshold levels. Receiver 
Operating Characteristic (ROC) curves were created. The ROC curves are shown in 
Figure 11. Table 1 displays the detector performance results at varying thresholds which 
were used to create the ROC curves. At an acceptable P(FA) of 0.1%, the automated 
passive acoustic matched-filter detector had an overall P(D) of 44%. The P(D) increased 
as the threshold was lowered; however, this detection improvement also increased the 
P(FA). The tradeoff between P(D) and P(FA) is an important factor to consider when 
utilizing the detector. The category of data being processed by the detector also affected 
the P(D) and P(FA) rate. As described in the previous chapter, the data was separated 
into two distinct categories for further detector assessment. At an acceptable P(FA) of 
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0.1%; the detector had a P(D) of 61% and 26% in uncluttered and cluttered data, 
respectively. The detector had a lower P(FA) when processing the uncluttered data in 
comparison to the cluttered data. 


ROC curves 



P(FA) 


Figure 11. ROC curves to assess the detector’s performance; The orange curve is the 
overall performance of the detector, combining both the uncluttered and cluttered 
time periods. The detector performed best during the uncluttered time periods, 
shown by the green line. The detector performance was degraded during the 
cluttered time periods, shown by the blue line. 


DETECTOR PERFORMANCE RESULTS 

THRESHOLD 

UNCLUTTERED 

CLUTT 

ERED 

COMB 

NED 

P(D) 

P(FA) 

P(D) 

P(FA) 

P(D) 

P(FA) 

5.00E-04 

86.0815% 

0.6355% 

92.1336% 

8.8121% 

89.6917% 

1.8614% 

1.00E-03 

79.1634% 

0.3016% 

90.3412% 

5.7345% 

85.8094% 

1.1161% 

1.25E-03 

66.7453% 

0.1314% 

86.7028% 

3.2769% 

78.6127% 

0.6030% 

1.50E-03 

55.2777% 

0.0703% 

80.7665% 

2.0337% 

70.4451% 

0.3646% 

1.75E-03 

46.2646% 

0.0416% 

74.0090% 

1.3750% 

62.7993% 

0.2415% 

2.00E-03 

39.0050% 

0.0259% 

67.9076% 

0.9926% 

56.2827% 

0.1709% 

3.00E-03 

20.1885% 

0.0059% 

49.4160% 

0.3792% 

37.7543% 

0.0619% 

5.00E-03 

7.4456% 

0.0010% 

26.3202% 

0.0996% 

19.0952% 

0.0158% 


Table 1. Automated passive acoustic matched-filter detector performance results for the 

uncluttered, cluttered, and combined time periods. 
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Figure 12 is an example of a cluttered time period. The detector output statistics 
are depicted for two different threshold levels, which are accentuated with a horizontal 
orange line in the middle and bottom panels. The dashed blue lines on the spectrograms 
indicate the location of the ground truth selections. When the threshold level is set at 
0.005, as in the middle panel of Figure 12, the number of false alarms, even in a cluttered 
time period is acceptably low. The detector does hit on several of the ground truths; 
however, at this threshold the detector misses even more Ziphius clicks than it correctly 
detects. When the threshold is lowered by an order of magnitude, as in the bottom panel, 
the detector is able to accurately hit each of the ground truths with zero misses. The 
tradeoff is the significant increase in false alarms because the threshold level is now 
located within the clutter. These low values of detector output may contain Ziphius 
clicks; however, the statistics declare these as false alarms when compared to the ground 
truth. The ground truth is a conservative estimate because of the inability to visually 
detect a Ziphius click when it is embedded in the noise. The actual statistical output for 
the cluttered data would contain fewer false alarms if it were being compared to a perfect 
ground truth. The cluttered ROC curve would be shifted significantly to the left in the 
case of a perfect ground truth. 
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Detector Output Detector Output 



Corresponding Detector Output: 5E*3 Threshold 



Corresponding Detector Output: 5E-4 Threshold 



Figure 12. Detector output statistics for a cluttered time period; The upper panel is the 
spectrogram with ground truth click identifications marked by the dashed blue 
line. The middle panel is the corresponding detector output for a given threshold 
and the bottom panel is the corresponding detector output for a lowered threshold. 
The threshold level is denoted by the solid orange line on the middle and bottom 
panels. In the middle panel, the detector misses several of the ground truth click 
identifications; however, the false alarm rate is very low. The detector is able to 
hit all of the ground truth click identifications with no misses when the threshold 
is at the lowest level; however, there is a significant increase in false alarms. 
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In comparison to the cluttered time periods, the deteetor output statisties indieated 
mueh lower false alarm rates when the deteetor was proeessing uneluttered data. Figure 
13 is an uneluttered example wherein the false alarm rate remains low even at a threshold 
level of 0.0005. The ground truth for the uneluttered time periods is also eonservative in 
comparison to what a perfect ground truth would indicate. As in the cluttered data, this 
inherent flaw causes a resultant inerease in the false alarm rate. The availability of a 
perfect ground truth would serve to lower the P(FA) and shift the uncluttered ROC curve 
to the left. However, it is not as signifieant of a shift as would occur with the eluttered 
data ROC eurve. 

The unavoidable ground truth bias does not alone aecount for the detector’s 
performance failures. Even in uncluttered data, the detector has displayed limitations 
when in the presence of a vocalizing Cuvier’s beaked whale. Figure 13 exemplifies the 
detector’s failure to hit a ground truth even at the lowest analyzed threshold level. In this 
instanee, the deteetor eorreetly hits 10 of II ground truths within a distinct click pattern. 
Unexpectedly, the deteetor fails to correctly hit one of the seemingly stronger clicks 
within the cliek train. This statistical miss could potentially be a eonsequenee of 
multipaths or environmental effects. The cross-correlation of the kernel to the SCORE 
click ensemble, shown in previously in Eigure 5, verifies this resultant deerease in the 
correlation value when multipath effects are present. However, it can most likely be 
attributed to the low signal to noise ratio (SNR). 
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Figure 13. Uncluttered data example portraying the detector’s limitations with a low 
SNR; The upper panel is the spectrogram of an uncluttered time period, with the 
ground truth click identifications emphasized with the dashed blue line. The 
lower panel is the corresponding detector output for a threshold of 5E-4, which is 

depicted with the solid orange line 


Another shortcoming of the detector is its performance when other marine 
mammals are vocalizing within the same time period as Cuvier’s beaked whales. Figure 
14 is a designated cluttered time period wherein there appears to be delphind activity as 
well as Ziphius clicks. The detector performed well to hit each of the ground truths at a 
threshold of 0.001, shown in the bottom panel; however, it also hits on the apparent 
delphind clicks. The correlation values for the non-Ziphius vocalizations vary throughout 
the time period which makes it difficult to select a threshold that will still detect the 
Cuvier’s clicks while correctly rejecting the undesired vocalizations. Increasing the 
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threshold, shown in the middle panel of Figure 14, improves the detector’s performance 
by dramatically lowering the number of false alarms. However, at this particular 
threshold level, there are several missed detections. 





Figure 14. Detector performance in the presence of delphinid activity: The top panel is a 
spectrogram with the ground truth identifications marked with dashed blue lines. 
The middle and lower panels display the corresponding detector output at a given 
threshold, marked with the orange line. The P(D) is better at the lower threshold; 
however, the P(FA) increases as well. 
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The detector performance was degraded when other marine mammals vocalized 
within the same time period as a Cuvier’s. In spite of this, the detector performed well in 
the presence of multiple Ziphius. Figure 15 portrays an ICI that is approximately one half 
the routinely observed 0.05s ICI. The shortened ICI and alternating magnitude strengths 
on the spectrogram suggest that there are two Cuvier’s beaked whales vocalizing 
intermittently. This example also depicts the conservative bias inherent to the ground 
truth. The false alarms in the initial portion of this window most likely contain Ziphius 
clicks that were visually indiscernible. 



Corresponding Detector Output: IE-3 Threshold 
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Figure 15. Detector performance in the presence of two Cuvier’s beaked whales; The 
upper panel is the spectrogram and the ground truth click identifications are 
emphasized with the dashed blue lines. The lower panel is the corresponding 
detector output. The detector performs well in the presence of multiple Ziphius 
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The detector performance was also analyzed during time periods where no 
Ziphius activity was observed. The detector performed perfectly in these instances where 
the ground truth contained zero clicks. The respective detector output statistics indicated 
zero hits, zero misses, and zero false alarms at all analyzed thresholds during these 
known quiet periods. 
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IV. CONCLUSIONS 


The unique spectral and temporal structures of Cuvier’s beaked whales’ 
vocalizations are favorable for automated detection via a matched-filter. A kernel was 
generated for two different types of acoustic recording devices: a HARP and a SCORE 
hydrophone. The kernel, that was generated from data band-pass filtered between 15 - 
40 kHz, had a 390 ps duration. This is slightly greater than the click durations cited in 
recent research: 175 ps (Johnson et ah, 2004), 200 ps (Zimmer et ah, 2005), 250 ps 
(Johnson et ah, 2004 and Tyack et ah, 2006), and 300 ps (Pavan et ah, 2006). This 
difference can likely be attributed to the available bandwidth of the acoustic recording 
instrument or the nature of the comparison. An acoustic recording instrument with a 
narrower bandwidth would capture a shorter duration of the click than an instrument with 
a wider bandwidth. Also, this is not a direct click to click comparison. The kernel is a 
compilation of 150 different clicks that were statistically analyzed to extract one 
dominant component, which accounted for 73% of the variance. 

The consistently observed ICI in this study was approximately 0.05 s. This 
observation is in disagreement with other recently published research: 0.38 s (Pavan et 
ah, 2006), 0.40 s (Johnson et ah, 2004 and 2006, and Tyack et ah, 2006), 0.43 s (Zimmer 
et ah, 2005). The ICI was not the focus of this project. It was, however, a consistently 
observed phenomenon during the ensemble and ground truth creation. The difference of 
an entire order of magnitude is a significant result. One possible explanation is that there 
were multiple animals vocalizing intermittently. However, the extremely concise and 
repetitive intervals are suggestive of a single animal. It is also possible that these 
vocalizations are made by a species other than a Ziphius cavirostris or that this species 
simply vocalizes at varying ICIs. Further exploration of this unexpected disparity was 
beyond the scope of this research. 

A total of 174.8 minutes of data from NPS’s DAS recordings at a SCORE 
hydrophone were reviewed. Specific criteria were adhered to in an attempt to limit 
subjectivity. The objective selection criteria included: spectrogram energy between 22.5 
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and 35 kHz and/or a distinctive click train pattern, and a single selection of a cluster. 
Following this criteria, 28,434 clicks were selected for inclusion in the ground truth. 
Time segments when the data recordings were interrupted or when the signal could not be 
confidently discerned due to indistinguishable clutter were removed. 28.89 minutes were 
purposefully excluded from the statistical analysis. The remaining ground truth was then 
separated into categories of cluttered and non-cluttered data to further distinguish the 
ROC curves. 

Despite all attempts to produce a precise ground truth, it was an inherently 
conservative estimate. At times, signals could not be visually discerned that the detector 
was able to detect. The cluttered data times were affected by this prejudice more so than 
the uncluttered data times. During the cluttered time periods, actual signals became 
hidden with the noise; thus, causing misses in the ground truth. These ground truth 
misses became detector false alarms in the detector evaluation. If this bias could be 
removed, the detector performance would be improved. The detector’s performance in 
cluttered time periods would improve significantly as compared to a slight improvement 
during the uncluttered time periods. 

At an acceptable false alarm rate of 0.1%: the overall detector’s P(D) is 44%. The 
detector performed best in uncluttered time periods with a 61% P(D) for a 0.1% false 
alarm rate The detector’s performance degrades in cluttered data: the detector has a P(D) 
of 26% at a 0.1% false alarm rate. The detector performance is perfect in the absence of 
clicks. The detector does not distinguish well between non-ziphid type vocalizations and 
Cuvier’s beaked whales’ vocalizations. 

The greatest problem for the detector is the significant number of false alarms 
from other than desired marine mammals. The detector definitely detects clicks. 
However, it cannot be absolutely certain as to what species’ clicks are being detected due 
to the inability to visually discern the differences at the time scale used. The kernel that 
was developed from the second sub-set of the ensemble, which was band-pass filtered 
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from 15-60 kHz, was not utilized or assessed in this researeh. Assessing this second 
kernel with HARP data recordings can provide further insight as to the competency of the 
detector. 

Potential follow-on research that could build upon the premises established in this 
thesis includes: 

* An in-depth investigation of the ICI disparities between this thesis and other 
research 

* Increasing the ensemble sample size, performing a PCA, and then comparing the 
resultant kernel to the kernel used in this research 

* With the availability of an enhanced kernel, repeating the detector performance 
analysis 

* Applying the detector to other SCORE hydrophones within the NPS DAS 

* Duplicating this research with the unevaluated kernel and assessing the detector’s 
performance when processing NPS and/or SIO HARP data 

* Comparing temporally coincident detector results from a SCORE hydrophone 
with results from the nearby SIO site H HARP 

* Assessing the classification performance of the kernel to correctly identify a 
Ziphius click from other marine mammals’ vocalizations 

* A study utilizing the optimum detector to assess the geographic call density 
distribution 

* A study utilizing the optimum detector to assess the seasonal and/or diurnal 
variability call patterns 
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